Duration prediction using multi-level model for GPR-based speech synthesis

نویسندگان

Decha Moungsri

Tomoki Koriyama

Takao Kobayashi

چکیده

This paper introduces frame-based Gaussian process regression (GPR) into phone/syllable duration modeling for Thai speech synthesis. The GPR model is designed for predicting framelevel acoustic features using corresponding frame information, which includes relative position in each unit of utterance structure and linguistic information such as tone type and part of speech. Although the GPR-based prediction can be applied to a phone duration model, the use of phone duration model only is not always sufficient to generate natural sounding speech. Specifically, in some languages including Thai, syllable durations affect the perception of sentence structure. In this paper, we propose a duration prediction technique using a multi-level model which includes syllable and phone levels for prediction. In the technique, first, syllable durations are predicted, and then they are used as additional contexts in phone-level model to generate phone duration for synthesizing. Objective and subjective evaluation results show that GPR-based modeling with multi-level model for duration prediction outperforms the conventional HMM-based speech synthesis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data

In this paper, we evaluate a framework of statistical parametric speech synthesis based on Gaussian process regression (GPR) and compare it with those based on hidden Markov model (HMM) and deep neural network (DNN). Recently, for the purpose of improving the performance of HMM-based speech synthesis, novel frameworks using deep architectures have been proposed and have shown their effectivenes...

متن کامل

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis

Hidden Markov Model (HMM) based speech synthesis using Decision Tree (DT) for duration prediction is known to produce over-averaged rhythm. To alleviate this problem, this paper proposes a two level duration prediction method together with outlier removal. This method takes advantages of accurate regression capability by Extreme Learning Machine (ELM) for phone level duration prediction, and th...

متن کامل

Analysis of Duration Prediction Accuracy in HMM-Based Speech Synthesis

Appropriate phoneme durations are essential for high quality speech synthesis. In hidden Markov model-based text-tospeech (HMM-TTS), durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper we analyze the accuracy ...

متن کامل

Auto-Switch Gaussian Process Regression-based Probabilistic Soft Sensors for Industrial Multi-Grade Processes with Transitions

Prediction uncertainty has rarely been integrated into traditional soft sensors in industrial processes. In this work, a novel auto-switch probabilistic soft sensor modeling method is proposed for online quality prediction of a whole industrial multi-grade process with several steady-state grades and transitional modes. Several single Gaussian process regression (GPR) models are first construct...

متن کامل

Explicit duration modelling in HMM-based speech synthesis using a hybrid hidden Markov model-multilayer perceptron

In HMM-based speech synthesis, it is important to correctly model duration because it has a significant effect on the perceptual quality of speech, such as rhythm. For this reason, hidden semi-Markov model (HSMM) is commonly used to explicitly model duration instead of using the implicit state duration model of HMM through its transition probabilities. The cost of using HSMM to improve duration...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Duration prediction using multi-level model for GPR-based speech synthesis

نویسندگان

چکیده

منابع مشابه

A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis

Analysis of Duration Prediction Accuracy in HMM-Based Speech Synthesis

Auto-Switch Gaussian Process Regression-based Probabilistic Soft Sensors for Industrial Multi-Grade Processes with Transitions

Explicit duration modelling in HMM-based speech synthesis using a hybrid hidden Markov model-multilayer perceptron

عنوان ژورنال:

اشتراک گذاری